160 research outputs found
Dev2vec: Representing Domain Expertise of Developers in an Embedding Space
Accurate assessment of the domain expertise of developers is important for
assigning the proper candidate to contribute to a project or to attend a job
role. Since the potential candidate can come from a large pool, the automated
assessment of this domain expertise is a desirable goal. While previous methods
have had some success within a single software project, the assessment of a
developer's domain expertise from contributions across multiple projects is
more challenging. In this paper, we employ doc2vec to represent the domain
expertise of developers as embedding vectors. These vectors are derived from
different sources that contain evidence of developers' expertise, such as the
description of repositories that they contributed, their issue resolving
history, and API calls in their commits. We name it dev2vec and demonstrate its
effectiveness in representing the technical specialization of developers. Our
results indicate that encoding the expertise of developers in an embedding
vector outperforms state-of-the-art methods and improves the F1-score up to
21%. Moreover, our findings suggest that ``issue resolving history'' of
developers is the most informative source of information to represent the
domain expertise of developers in embedding spaces.Comment: 30 pages, 5 figure
Alloprof: a new French question-answer education dataset and its use in an information retrieval case study
Teachers and students are increasingly relying on online learning resources
to supplement the ones provided in school. This increase in the breadth and
depth of available resources is a great thing for students, but only provided
they are able to find answers to their queries. Question-answering and
information retrieval systems have benefited from public datasets to train and
evaluate their algorithms, but most of these datasets have been in English text
written by and for adults. We introduce a new public French question-answering
dataset collected from Alloprof, a Quebec-based primary and high-school help
website, containing 29 349 questions and their explanations in a variety of
school subjects from 10 368 students, with more than half of the explanations
containing links to other questions or some of the 2 596 reference pages on the
website. We also present a case study of this dataset in an information
retrieval task. This dataset was collected on the Alloprof public forum, with
all questions verified for their appropriateness and the explanations verified
both for their appropriateness and their relevance to the question. To predict
relevant documents, architectures using pre-trained BERT models were fine-tuned
and evaluated. This dataset will allow researchers to develop
question-answering, information retrieval and other algorithms specifically for
the French speaking education context. Furthermore, the range of language
proficiency, images, mathematical symbols and spelling mistakes will
necessitate algorithms based on a multimodal comprehension. The case study we
present as a baseline shows an approach that relies on recent techniques
provides an acceptable performance level, but more work is necessary before it
can reliably be used and trusted in a production setting
Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing
One of the critical phases in software development is software testing.
Testing helps with identifying potential bugs and reducing maintenance costs.
The goal of automated test generation tools is to ease the development of tests
by suggesting efficient bug-revealing tests. Recently, researchers have
leveraged Large Language Models (LLMs) of code to generate unit tests. While
the code coverage of generated tests was usually assessed, the literature has
acknowledged that the coverage is weakly correlated with the efficiency of
tests in bug detection. To improve over this limitation, in this paper, we
introduce MuTAP for improving the effectiveness of test cases generated by LLMs
in terms of revealing bugs by leveraging mutation testing. Our goal is achieved
by augmenting prompts with surviving mutants, as those mutants highlight the
limitations of test cases in detecting bugs. MuTAP is capable of generating
effective test cases in the absence of natural language descriptions of the
Program Under Test (PUTs). We employ different LLMs within MuTAP and evaluate
their performance on different benchmarks. Our results show that our proposed
method is able to detect up to 28% more faulty human-written code snippets.
Among these, 17% remained undetected by both the current state-of-the-art fully
automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning
approaches on LLMs. Furthermore, MuTAP achieves a Mutation Score (MS) of 93.57%
on synthetic buggy code, outperforming all other approaches in our evaluation.
Our findings suggest that although LLMs can serve as a useful tool to generate
test cases, they require specific post-processing steps to enhance the
effectiveness of the generated test cases which may suffer from syntactic or
functional errors and may be ineffective in detecting certain types of bugs and
testing corner cases PUTs.Comment: 16 pages, 3 figure
Dynamic student classification on memory networks for knowledge tracing
Pinnacle lab for analytics at Singapore Management Universit
- …